First we create the following two files for each project:
$ git shortlog -ns > sympy-all.txt
$ git shortlog -ns --since="1 year ago" > sympy-year.txt
Then we load it up and create various plots. First we analyze the last year only.
In [6]:
%pylab inline
def get_data(filename):
data = array([int(l.split()[0]) for l in open(filename).readlines()])
return data
The linear tail on log-linear graph shows that each project has an exponential tail:
In [7]:
for project in ["sympy", "ipython", "numpy", "matplotlib
", "django"]:
data = get_data("%s-year.txt" % project)
#data = data / float(sum(data))
#data = data / float(data[0])
semilogy(data, lw=2, label="%s last year" % project)
legend()
grid()
xlabel("individual people")
ylabel("total number of patches")
xlim([0, 130]);
Including the linux kernel:
In [8]:
for project in ["sympy", "ipython", "numpy", "mpl", "django", "linux"]:
data = get_data("%s-year.txt" % project)
#data = data / float(sum(data))
#data = data / float(data[0])
semilogy(data, lw=2, label="%s last year" % project)
legend()
grid()
xlabel("individual people")
ylabel("total number of patches")
#xlim([0, 130]);
The same graph restricted to 130 people max:
In [9]:
for project in ["sympy", "ipython", "numpy", "mpl", "django", "linux"]:
data = get_data("%s-year.txt" % project)
#data = data / float(sum(data))
#data = data / float(data[0])
semilogy(data, lw=2, label="%s last year" % project)
legend()
grid()
xlabel("individual people")
ylabel("total number of patches")
xlim([0, 130]);
And 20 people:
In [10]:
for project in ["sympy", "ipython", "numpy", "matplotlib", "django", "linux"]:
data = get_data("%s-year.txt" % project)
#data = data / float(sum(data))
#data = data / float(data[0])
semilogy(data, lw=2, label="%s last year" % project)
legend()
grid()
xlabel("individual people")
ylabel("total number of patches")
xlim([0, 20]);
We can normalize the curves by the total number of patches:
In [11]:
for project in ["sympy", "ipython", "numpy", "mpl", "django", "linux"]:
data = get_data("%s-year.txt" % project)
data = data / float(sum(data))
#data = data / float(data[0])
semilogy(data, lw=2, label="%s last year" % project)
legend()
grid()
xlabel("individual people")
ylabel("relative number of patches")
xlim([0, 130]);
ylim([1e-4, 1]);
Or by the most active contributor:
In [12]:
for project in ["sympy", "ipython", "numpy", "mpl", "django", "linux"]:
data = get_data("%s-year.txt" % project)
#data = data / float(sum(data))
data = data / float(data[0])
semilogy(data, lw=2, label="%s last year" % project)
legend()
grid()
xlabel("individual people")
ylabel("number of patches relative to \nthe most active contributor")
xlim([0, 130]);
ylim([1e-4, 1]);
In [13]:
for project in ["sympy", "ipython", "numpy", "mpl", "django", "linux"]:
data = get_data("%s-year.txt" % project)
#data = data / float(sum(data))
data = data / float(data[0])
semilogy(data, lw=2, label="%s last year" % project)
legend()
grid()
xlabel("individual people")
ylabel("number of patches relative to \nthe most active contributor")
xlim([0, 20]);
ylim([1e-2, 1]);
Now we do the same graphs for all patches (not just the last year):
In [76]:
for project in ["sympy", "ipython", "numpy", "matplotlib", 'sklearn', 'pandas', 'scipy']:
data = get_data("%s-all.txt" % project)
#data = data / float(sum(data))
#data = data / float(data[0])
data = np.append(data, [0.55]*(300 - len(data)))
semilogy(data, lw=2, label="%s all" % project)
legend()
grid()
xlabel("individual people")
ylabel("total number of patches")
xlim([0, 300]);
ylim([0.6, 1e4]);
savefig("commits-all.pdf")
In [58]:
for project in ["sympy", "ipython", "numpy", "matplotlib", "django", "linux"]:
data = get_data("%s-all.txt" % project)
data = data / float(sum(data))
#data = data / float(data[0])
semilogy(data, lw=2, label="%s all" % project)
legend()
grid()
xlabel("individual people")
ylabel("total number of patches")
xlim([0, 130]);
ylim([1e-4, 1]);
In [11]:
for project in ["sympy", "ipython", "numpy", "mpl", "django", "linux"]:
data = get_data("%s-all.txt" % project)
#data = data / float(sum(data))
data = data / float(data[0])
semilogy(data, lw=2, label="%s all" % project)
legend()
grid()
xlabel("individual people")
ylabel("number of patches relative to \nthe most active contributor")
xlim([0, 130]);
ylim([1.5e-4, 1]);
In [75]:
for project in ["sympy", "ipython", "numpy", "matplotlib", "sklearn", "pandas", "scipy"]:
data = get_data("%s-year.txt" % project)
#data = data / float(sum(data))
#data = data / float(data[0])
plot(data, lw=2, label="%s last year" % project)
axhline(50, lw=1, color='k', linestyle='--')
legend()
grid()
xlabel("Individual committer")
ylabel("# of commits")
xlim([0, 25]);
#ylim([0, 1]);
savefig("commits1.pdf")
In [74]:
for project in ["sympy", "ipython", "numpy", "matplotlib", "sklearn", "pandas", "scipy"]:
data = get_data("%s-year.txt" % project)
#data = data / float(sum(data))
data = data / float(data[0])
plot(data, lw=2, label="%s last year" % project)
axhline(0.1, lw=1, color='k', linestyle='--')
legend()
grid()
xlabel("Individual committer")
ylabel("Commit rate")
xlim([0, 25]);
#ylim([0, 1]);
savefig("commits2.pdf")
In [ ]: